Multi-tasking
Volume Number: 2
Issue Number: 4
Column Tag: Threaded Code: Mach 1
A Multi-tasking Forth System
By Jörg Langowsk, EMBL, C/O I.L.L., Grenoble, Cedex, France,
MacTutor Editorial Board
"Subroutine threaded Forth - the Mach 1 system
This time I am going to introduce to you a new Forth-83 system that I recently
got to review. This system is implemented in a somewhat different way compared with
MacForth, NEON or other Forths, so let me digress a little and make some comments
about Forth implementations on 68000 based systems.
As most of this column's readers know, 'threaded code' means program code that
is organized in a very hierarchical manner; a 'main program' or colon definition
consists of a list of words that describe other lower-level definitions that in turn are
lists of even lower level definitions and so on ... until one arrives at the definitions of
the very lowest level that are directly executable machine code (See Figure 1 below).
How do Forth systems implement this structure? The definition of every Forth
word, colon definition, variable, etc. contains one short piece of executable machine code
at the very beginning. This is called the code field and determines in which way the
information following it is handled. For example, in a colon definition in MacForth, the
code field contains a TRAP #F (V2.0) or TRAP #E (V2.4) instruction that jumps to
code that in turn:
-pushes the instruction pointer (A3) on the return stack,
-sets the instruction pointer to the address of the first word of the new definition to
be executed,
-jumps to the NEXT routine which executes the tokens that follow the code field.
Mac Forth is an example of 'token threaded Forth' which means that the numbers
following the code field are not valid addresses of other Forth words, but 'tokens' that
are to be converted into addresses by some algorithm. In Mac Forth, positive token
values are interpreted as offsets from a base pointer (A4), while negative token values
are negative offsets from the top of application space (A5) into a token table, where the
32-bit address of the Forth word is found. The token table is used to handle programs
that exceed the 32 K address space that can be handled through the first method of
addressing.
The advantage of MacForth's and similar structures is that we need only 16 bits
for each word compiled into a definition. The disadvantage is the limited address space
(32K) for the simple A4-indexed addressing mode and the need for a token table to
convert tokens into addresses that lie outside the 32K addressing region. Time is lost on
each NEXT loop by having to test whether the token is negative or positive, then
eventually by accessing the token table ( See Figure 2 ).
The Forth implementation that was used to create NEON does not need a token
table since each word uses 32 bits, but more space is used in the compiled definitions
that way.
Fig. 3 gives a summary of some possible addressing modes for Forth interpreters
with their execution times (in system clocks, from the MC68000 manual). Some of
these examples have been taken from a Dr. Dobb's Journal article by Joe Barnhart,
'Forth and the Motorola 68000', DDJ 83 (1983) 18-26.
MacForth does not conform to any of the 'pure' definitions here, but (cf. Fig. 2)
is a mixture of a base-offset direct word and base-offset indirect long threaded code
(with an additional branch required).
You see that in all the examples given the inner interpreter requires
considerable overhead to 'set up' the forth words for execution.
Additional overhead is created by the execution of the small piece of machine code
starting at the CFA. In MacForth this is a TRAP instruction (38 cycles) plus three lines
of assembly code that the trap vector points to (32 cycles). Counting everything up,
execution of one colon-defined Forth word requires 100 or 120 cycles under
MacForth, depending whether or not the token table has to be accessed. In a system like
NEON, the CFA does not contain a trap, but a JSR to a routine that sets up the inner
interpreter, with a similar amount of time involved.
One solution around this problem is to code time-critical things in assembly
language, so that the whole definition is made up of executable 68000 code, starting
with the code field. I have given an example for this in MT V1#9 by defining macro
words that compile inline code into a definition. The speedup was of the order of 50% for
MacForth. Of course, if you write inline macros, you are not really creating threaded
code, but linear code, and the size of the definition tends to get rather large.
Still, there is another way to create threaded, i.e. hierarchical, code that does
away with the needs for an 'inner interpreter' loop completely, since the threaded code
will be directly executable 68000 machine language. This sounds strange, but is
actually very simple to achieve. The keyword is 'subroutine threading'. As you saw
from the inline code example, we can do without a 'code field' by just starting to execute
our definition at the very beginnning, provided it is all bona fide 68000 code. Now, all
we have to do is to persuade the Forth system not to compile a list of addresses into a
colon definition, but a list of JSR (addr). This way, we don't need the inner interpreter
at all, Forth instructions are 68000 instructions; the subroutine return stack, in this
case, is the normal 68000 stack, pointed to by A7 (Mach 1 uses still a different stack